Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPIR-V] Lower llvm.x.with.overflow intrinsics #95012

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

michalpaszkowski
Copy link
Member

This patch introduces lowering for the remaining llvm.x.with.overflow
intrinsics. The proposed implementation does not rely on OpIAddCarry
for sadd_with_overflow and equivalent for ssub_with_overflow.

Also, the patch changes the ordering of running the
SPIRVPrepareFunctions pass, so that it runs after CodeGenPrepare.

The changes push further the compilation of vector/scalar_access.cpp
and other DPC++ E2E tests with -O2 or higher optimization levels.

This change makes sure the preferresd switch condition int type size
remains the same throughout CodeGen optimizations.

The change fixes running several OpenCL applications with -O2 or higher
opt levels, and fixes Basic/stream/stream_max_stmt_exceed.cpp DPC++ E2E
test with -O2.
This patch introduces lowering for the remaining llvm.x.with.overflow
intrinsics. The proposed implementation does not rely on OpIAddCarry
for sadd_with_overflow and equivalent for ssub_with_overflow.

Also, the patch changes the ordering of running the
SPIRVPrepareFunctions pass, so that it runs after CodeGenPrepare.

The changes push further the compilation of vector/scalar_access.cpp
and other DPC++ E2E tests with -O2 or higher optimization levels.
@llvmbot
Copy link
Collaborator

llvmbot commented Jun 10, 2024

@llvm/pr-subscribers-backend-spir-v

Author: Michal Paszkowski (michalpaszkowski)

Changes

This patch introduces lowering for the remaining llvm.x.with.overflow
intrinsics. The proposed implementation does not rely on OpIAddCarry
for sadd_with_overflow and equivalent for ssub_with_overflow.

Also, the patch changes the ordering of running the
SPIRVPrepareFunctions pass, so that it runs after CodeGenPrepare.

The changes push further the compilation of vector/scalar_access.cpp
and other DPC++ E2E tests with -O2 or higher optimization levels.


Full diff: https://github.com/llvm/llvm-project/pull/95012.diff

7 Files Affected:

  • (modified) llvm/lib/Target/SPIRV/SPIRVISelLowering.h (+5)
  • (modified) llvm/lib/Target/SPIRV/SPIRVPrepareFunctions.cpp (+75-27)
  • (modified) llvm/lib/Target/SPIRV/SPIRVTargetMachine.cpp (+1-1)
  • (modified) llvm/test/CodeGen/SPIRV/assume.ll (+4-4)
  • (added) llvm/test/CodeGen/SPIRV/llvm-intrinsics/smul.with.overflow.ll (+54)
  • (modified) llvm/test/CodeGen/SPIRV/llvm-intrinsics/umul.with.overflow.ll (+1-1)
  • (added) llvm/test/CodeGen/SPIRV/optimizations/switch-condition-type.ll (+18)
diff --git a/llvm/lib/Target/SPIRV/SPIRVISelLowering.h b/llvm/lib/Target/SPIRV/SPIRVISelLowering.h
index 6fc200abf4627..77356b7512a73 100644
--- a/llvm/lib/Target/SPIRV/SPIRVISelLowering.h
+++ b/llvm/lib/Target/SPIRV/SPIRVISelLowering.h
@@ -68,6 +68,11 @@ class SPIRVTargetLowering : public TargetLowering {
   // extra instructions required to preserve validity of SPIR-V code imposed by
   // the standard.
   void finalizeLowering(MachineFunction &MF) const override;
+
+  MVT getPreferredSwitchConditionType(LLVMContext &Context,
+                                      EVT ConditionVT) const override {
+    return ConditionVT.getSimpleVT();
+  }
 };
 } // namespace llvm
 
diff --git a/llvm/lib/Target/SPIRV/SPIRVPrepareFunctions.cpp b/llvm/lib/Target/SPIRV/SPIRVPrepareFunctions.cpp
index 7bee87d7204ed..4c3b6a6a24737 100644
--- a/llvm/lib/Target/SPIRV/SPIRVPrepareFunctions.cpp
+++ b/llvm/lib/Target/SPIRV/SPIRVPrepareFunctions.cpp
@@ -342,26 +342,70 @@ static void lowerFunnelShifts(IntrinsicInst *FSHIntrinsic) {
   FSHIntrinsic->setCalledFunction(FSHFunc);
 }
 
-static void buildUMulWithOverflowFunc(Function *UMulFunc) {
-  // The function body is already created.
-  if (!UMulFunc->empty())
+static void buildArithWithOverflowFunc(Function *Func, Intrinsic::ID ID) {
+  if (!Func->empty())
     return;
 
-  BasicBlock *EntryBB = BasicBlock::Create(UMulFunc->getParent()->getContext(),
-                                           "entry", UMulFunc);
+  BasicBlock *EntryBB =
+      BasicBlock::Create(Func->getParent()->getContext(), "entry", Func);
   IRBuilder<> IRB(EntryBB);
-  // Build the actual unsigned multiplication logic with the overflow
-  // indication. Do unsigned multiplication Mul = A * B. Then check
-  // if unsigned division Div = Mul / A is not equal to B. If so,
-  // then overflow has happened.
-  Value *Mul = IRB.CreateNUWMul(UMulFunc->getArg(0), UMulFunc->getArg(1));
-  Value *Div = IRB.CreateUDiv(Mul, UMulFunc->getArg(0));
-  Value *Overflow = IRB.CreateICmpNE(UMulFunc->getArg(0), Div);
-
-  // umul.with.overflow intrinsic return a structure, where the first element
-  // is the multiplication result, and the second is an overflow bit.
-  Type *StructTy = UMulFunc->getReturnType();
-  Value *Agg = IRB.CreateInsertValue(PoisonValue::get(StructTy), Mul, {0});
+  Value *LHS = Func->getArg(0);
+  Value *RHS = Func->getArg(1);
+
+  Value *Result;
+  Value *Overflow;
+  Type *StructTy = Func->getReturnType();
+
+  switch (ID) {
+  case Intrinsic::smul_with_overflow:
+    Result = IRB.CreateNSWMul(LHS, RHS);
+    Overflow = IRB.CreateICmpNE(IRB.CreateSDiv(Result, LHS), RHS);
+    break;
+  case Intrinsic::umul_with_overflow:
+    Result = IRB.CreateNUWMul(LHS, RHS);
+    Overflow = IRB.CreateICmpNE(IRB.CreateUDiv(Result, LHS), RHS);
+    break;
+  case Intrinsic::sadd_with_overflow:
+    // TODO: Implement using OpIAddCarry
+    Result = IRB.CreateNSWAdd(LHS, RHS);
+    // Overflow if (LHS > 0 && RHS > 0 && Result < 0) || (LHS < 0 && RHS < 0 &&
+    // Result > 0)
+    Overflow = IRB.CreateOr(
+        IRB.CreateAnd(IRB.CreateAnd(IRB.CreateICmpSGT(LHS, IRB.getInt32(0)),
+                                    IRB.CreateICmpSGT(RHS, IRB.getInt32(0))),
+                      IRB.CreateICmpSLT(Result, IRB.getInt32(0))),
+        IRB.CreateAnd(IRB.CreateAnd(IRB.CreateICmpSLT(LHS, IRB.getInt32(0)),
+                                    IRB.CreateICmpSLT(RHS, IRB.getInt32(0))),
+                      IRB.CreateICmpSGT(Result, IRB.getInt32(0))));
+    break;
+  case Intrinsic::uadd_with_overflow:
+    Result = IRB.CreateNUWAdd(LHS, RHS);
+    // Overflow occurs if the result is less than either of the operands.
+    Overflow = IRB.CreateICmpULT(Result, LHS);
+    break;
+  case Intrinsic::ssub_with_overflow:
+    Result = IRB.CreateNSWSub(LHS, RHS);
+    // Overflow if (LHS < 0 && RHS > 0 && Result > 0) || (LHS > 0 && RHS < 0 &&
+    // Result < 0)
+    Overflow = IRB.CreateOr(
+        IRB.CreateAnd(IRB.CreateAnd(IRB.CreateICmpSLT(LHS, IRB.getInt32(0)),
+                                    IRB.CreateICmpSGT(RHS, IRB.getInt32(0))),
+                      IRB.CreateICmpSGT(Result, IRB.getInt32(0))),
+        IRB.CreateAnd(IRB.CreateAnd(IRB.CreateICmpSGT(LHS, IRB.getInt32(0)),
+                                    IRB.CreateICmpSLT(RHS, IRB.getInt32(0))),
+                      IRB.CreateICmpSLT(Result, IRB.getInt32(0))));
+    break;
+  case Intrinsic::usub_with_overflow:
+    Result = IRB.CreateNUWSub(LHS, RHS);
+    // Overflow occurs if the result is greater than the left-hand-side operand.
+    Overflow = IRB.CreateICmpUGT(Result, LHS);
+    break;
+
+  default:
+    llvm_unreachable("Unsupported arithmetic with overflow intrinsic.");
+  }
+
+  Value *Agg = IRB.CreateInsertValue(PoisonValue::get(StructTy), Result, {0});
   Value *Res = IRB.CreateInsertValue(Agg, Overflow, {1});
   IRB.CreateRet(Res);
 }
@@ -407,18 +451,17 @@ static bool toSpvOverloadedIntrinsic(IntrinsicInst *II, Intrinsic::ID NewID,
   return true;
 }
 
-static void lowerUMulWithOverflow(IntrinsicInst *UMulIntrinsic) {
+static void lowerArithWithOverflow(IntrinsicInst *ArithIntrinsic) {
   // Get a separate function - otherwise, we'd have to rework the CFG of the
   // current one. Then simply replace the intrinsic uses with a call to the new
   // function.
-  Module *M = UMulIntrinsic->getModule();
-  FunctionType *UMulFuncTy = UMulIntrinsic->getFunctionType();
-  Type *FSHLRetTy = UMulFuncTy->getReturnType();
-  const std::string FuncName = lowerLLVMIntrinsicName(UMulIntrinsic);
-  Function *UMulFunc =
-      getOrCreateFunction(M, FSHLRetTy, UMulFuncTy->params(), FuncName);
-  buildUMulWithOverflowFunc(UMulFunc);
-  UMulIntrinsic->setCalledFunction(UMulFunc);
+  Module *M = ArithIntrinsic->getModule();
+  FunctionType *FuncTy = ArithIntrinsic->getFunctionType();
+  Type *RetTy = FuncTy->getReturnType();
+  const std::string FuncName = lowerLLVMIntrinsicName(ArithIntrinsic);
+  Function *Func = getOrCreateFunction(M, RetTy, FuncTy->params(), FuncName);
+  buildArithWithOverflowFunc(Func, ArithIntrinsic->getIntrinsicID());
+  ArithIntrinsic->setCalledFunction(Func);
 }
 
 // Substitutes calls to LLVM intrinsics with either calls to SPIR-V intrinsics
@@ -444,8 +487,13 @@ bool SPIRVPrepareFunctions::substituteIntrinsicCalls(Function *F) {
         lowerFunnelShifts(II);
         Changed = true;
         break;
+      case Intrinsic::sadd_with_overflow:
+      case Intrinsic::smul_with_overflow:
+      case Intrinsic::ssub_with_overflow:
+      case Intrinsic::uadd_with_overflow:
       case Intrinsic::umul_with_overflow:
-        lowerUMulWithOverflow(II);
+      case Intrinsic::usub_with_overflow:
+        lowerArithWithOverflow(II);
         Changed = true;
         break;
       case Intrinsic::assume:
diff --git a/llvm/lib/Target/SPIRV/SPIRVTargetMachine.cpp b/llvm/lib/Target/SPIRV/SPIRVTargetMachine.cpp
index 52fc6f33b4ef1..845113dd48650 100644
--- a/llvm/lib/Target/SPIRV/SPIRVTargetMachine.cpp
+++ b/llvm/lib/Target/SPIRV/SPIRVTargetMachine.cpp
@@ -177,11 +177,11 @@ void SPIRVPassConfig::addIRPasses() {
 
   TargetPassConfig::addIRPasses();
   addPass(createSPIRVRegularizerPass());
-  addPass(createSPIRVPrepareFunctionsPass(TM));
   addPass(createSPIRVStripConvergenceIntrinsicsPass());
 }
 
 void SPIRVPassConfig::addISelPrepare() {
+  addPass(createSPIRVPrepareFunctionsPass(TM));
   addPass(createSPIRVEmitIntrinsicsPass(&getTM<SPIRVTargetMachine>()));
   TargetPassConfig::addISelPrepare();
 }
diff --git a/llvm/test/CodeGen/SPIRV/assume.ll b/llvm/test/CodeGen/SPIRV/assume.ll
index fbf12ef184a89..88ff74535bde7 100644
--- a/llvm/test/CodeGen/SPIRV/assume.ll
+++ b/llvm/test/CodeGen/SPIRV/assume.ll
@@ -1,7 +1,7 @@
-; RUN: llc -mtriple=spirv32-unknown-unknown --spirv-ext=+SPV_KHR_expect_assume < %s | FileCheck --check-prefixes=EXT,CHECK %s
-; RUN: llc -mtriple=spirv64-unknown-unknown --spirv-ext=+SPV_KHR_expect_assume < %s | FileCheck --check-prefixes=EXT,CHECK %s
-; RUN: llc -mtriple=spirv32-unknown-unknown < %s | FileCheck --check-prefixes=NOEXT,CHECK %s
-; RUN: llc -mtriple=spirv64-unknown-unknown < %s | FileCheck --check-prefixes=NOEXT,CHECK %s
+; RUN: llc -O0 -mtriple=spirv32-unknown-unknown --spirv-ext=+SPV_KHR_expect_assume < %s | FileCheck --check-prefixes=EXT,CHECK %s
+; RUN: llc -O0 -mtriple=spirv64-unknown-unknown --spirv-ext=+SPV_KHR_expect_assume < %s | FileCheck --check-prefixes=EXT,CHECK %s
+; RUN: llc -O0 -mtriple=spirv32-unknown-unknown < %s | FileCheck --check-prefixes=NOEXT,CHECK %s
+; RUN: llc -O0 -mtriple=spirv64-unknown-unknown < %s | FileCheck --check-prefixes=NOEXT,CHECK %s
 
 ; EXT:        OpCapability ExpectAssumeKHR
 ; EXT-NEXT:   OpExtension "SPV_KHR_expect_assume"
diff --git a/llvm/test/CodeGen/SPIRV/llvm-intrinsics/smul.with.overflow.ll b/llvm/test/CodeGen/SPIRV/llvm-intrinsics/smul.with.overflow.ll
new file mode 100644
index 0000000000000..b336e22a145b4
--- /dev/null
+++ b/llvm/test/CodeGen/SPIRV/llvm-intrinsics/smul.with.overflow.ll
@@ -0,0 +1,54 @@
+; RUN: llc -O0 -mtriple=spirv32-unknown-unknown %s -o - | FileCheck %s --check-prefix=CHECK-SPIRV
+
+; CHECK-SPIRV: OpName %[[#NAME_SMUL_FUNC_8:]] "spirv.llvm_smul_with_overflow_i8"
+; CHECK-SPIRV: OpName %[[#NAME_SMUL_FUNC_32:]] "spirv.llvm_smul_with_overflow_i32"
+; CHECK-SPIRV: OpName %[[#NAME_SMUL_FUNC_VEC_I64:]] "spirv.llvm_smul_with_overflow_v2i64"
+
+define dso_local spir_func void @_Z4foo8hhPh(i8 zeroext %a, i8 zeroext %b, i8* nocapture %c) local_unnamed_addr {
+entry:
+  ; CHECK-SPIRV: %[[#]] = OpFunctionCall %[[#]] %[[#NAME_SMUL_FUNC_8]]
+  %smul = tail call { i8, i1 } @llvm.smul.with.overflow.i8(i8 %a, i8 %b)
+  %cmp = extractvalue { i8, i1 } %smul, 1
+  %smul.value = extractvalue { i8, i1 } %smul, 0
+  %storemerge = select i1 %cmp, i8 0, i8 %smul.value
+  store i8 %storemerge, i8* %c, align 1
+  ret void
+}
+
+define dso_local spir_func void @_Z5foo32jjPj(i32 %a, i32 %b, i32* nocapture %c) local_unnamed_addr {
+entry:
+  ; CHECK-SPIRV: %[[#]] = OpFunctionCall %[[#]] %[[#NAME_SMUL_FUNC_32]]
+  %smul = tail call { i32, i1 } @llvm.smul.with.overflow.i32(i32 %b, i32 %a)
+  %smul.val = extractvalue { i32, i1 } %smul, 0
+  %smul.ov = extractvalue { i32, i1 } %smul, 1
+  %spec.select = select i1 %smul.ov, i32 0, i32 %smul.val
+  store i32 %spec.select, i32* %c, align 4
+  ret void
+}
+
+define dso_local spir_func void @smulo_v2i64(<2 x i64> %a, <2 x i64> %b, <2 x i64>* %p) nounwind {
+  ; CHECK-SPIRV: %[[#]] = OpFunctionCall %[[#]] %[[#NAME_SMUL_FUNC_VEC_I64]]
+  %smul = call {<2 x i64>, <2 x i1>} @llvm.smul.with.overflow.v2i64(<2 x i64> %a, <2 x i64> %b)
+  %smul.val = extractvalue {<2 x i64>, <2 x i1>} %smul, 0
+  %smul.ov = extractvalue {<2 x i64>, <2 x i1>} %smul, 1
+  %zero = alloca <2 x i64>, align 16
+  %spec.select = select <2 x i1> %smul.ov, <2 x i64> <i64 0, i64 0>, <2 x i64> %smul.val
+  store <2 x i64> %spec.select, <2 x i64>* %p
+  ret void
+}
+
+; CHECK-SPIRV: %[[#NAME_SMUL_FUNC_8]] = OpFunction %[[#]]
+; CHECK-SPIRV: %[[#VAR_A:]] = OpFunctionParameter %[[#]]
+; CHECK-SPIRV: %[[#VAR_B:]] = OpFunctionParameter %[[#]]
+; CHECK-SPIRV: %[[#MUL_RES:]] = OpIMul %[[#]] %[[#VAR_A]] %[[#VAR_B]]
+; CHECK-SPIRV: %[[#DIV_RES:]] = OpSDiv %[[#]] %[[#MUL_RES]] %[[#VAR_A]]
+; CHECK-SPIRV: %[[#CMP_RES:]] = OpINotEqual %[[#]] %[[#DIV_RES]] %[[#VAR_B]]
+; CHECK-SPIRV: %[[#INSERT_RES:]] = OpCompositeInsert %[[#]] %[[#MUL_RES]]
+; CHECK-SPIRV: %[[#INSERT_RES_1:]] = OpCompositeInsert %[[#]] %[[#CMP_RES]] %[[#INSERT_RES]]
+; CHECK-SPIRV: OpReturnValue %[[#INSERT_RES_1]]
+
+declare { i8, i1 } @llvm.smul.with.overflow.i8(i8, i8)
+
+declare { i32, i1 } @llvm.smul.with.overflow.i32(i32, i32)
+
+declare {<2 x i64>, <2 x i1>} @llvm.smul.with.overflow.v2i64(<2 x i64>, <2 x i64>)
diff --git a/llvm/test/CodeGen/SPIRV/llvm-intrinsics/umul.with.overflow.ll b/llvm/test/CodeGen/SPIRV/llvm-intrinsics/umul.with.overflow.ll
index 406a23fa7d3df..7a4137b875fd2 100644
--- a/llvm/test/CodeGen/SPIRV/llvm-intrinsics/umul.with.overflow.ll
+++ b/llvm/test/CodeGen/SPIRV/llvm-intrinsics/umul.with.overflow.ll
@@ -42,7 +42,7 @@ define dso_local spir_func void @umulo_v2i64(<2 x i64> %a, <2 x i64> %b, <2 x i6
 ; CHECK-SPIRV: %[[#VAR_B:]] = OpFunctionParameter %[[#]]
 ; CHECK-SPIRV: %[[#MUL_RES:]] = OpIMul %[[#]] %[[#VAR_A]] %[[#VAR_B]]
 ; CHECK-SPIRV: %[[#DIV_RES:]] = OpUDiv %[[#]] %[[#MUL_RES]] %[[#VAR_A]]
-; CHECK-SPIRV: %[[#CMP_RES:]] = OpINotEqual %[[#]] %[[#VAR_A]] %[[#DIV_RES]]
+; CHECK-SPIRV: %[[#CMP_RES:]] = OpINotEqual %[[#]] %[[#DIV_RES]] %[[#VAR_B]]
 ; CHECK-SPIRV: %[[#INSERT_RES:]] = OpCompositeInsert %[[#]] %[[#MUL_RES]]
 ; CHECK-SPIRV: %[[#INSERT_RES_1:]] = OpCompositeInsert %[[#]] %[[#CMP_RES]] %[[#INSERT_RES]]
 ; CHECK-SPIRV: OpReturnValue %[[#INSERT_RES_1]]
diff --git a/llvm/test/CodeGen/SPIRV/optimizations/switch-condition-type.ll b/llvm/test/CodeGen/SPIRV/optimizations/switch-condition-type.ll
new file mode 100644
index 0000000000000..054520d2021b9
--- /dev/null
+++ b/llvm/test/CodeGen/SPIRV/optimizations/switch-condition-type.ll
@@ -0,0 +1,18 @@
+; RUN: llc -O2 -mtriple=spirv32-unknown-unknown %s -o - | FileCheck %s
+; RUN: %if spirv-tools %{ llc -O2 -mtriple=spirv32-unknown-unknown %s -o - -filetype=obj | spirv-val %}
+
+; CHECK: %[[#INT16:]] = OpTypeInt 16 0
+; CHECK: %[[#PARAM:]] = OpFunctionParameter %[[#INT16]]
+; CHECK: OpSwitch %[[#PARAM]] %[[#]] 1 %[[#]] 2 %[[#]]
+
+define i32 @test_switch(i16 %cond) {
+entry:
+  switch i16 %cond, label %default [ i16 1, label %case_one
+                                     i16 2, label %case_two ]
+case_one:
+  ret i32 1
+case_two:
+  ret i32 2
+default:
+  ret i32 3
+}

Copy link
Collaborator

@efriedma-quic efriedma-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you clarify why you need to do this? The normal flow is that GlobalISel converts the intrinsic call to a generic instruction (G_SMULO etc.), and then LegalizeHelper converts that instruction into a legal instruction.

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should just go through the regular legalization process. You don't need to invent your own IR level expansion

Comment on lines +72 to +75
MVT getPreferredSwitchConditionType(LLVMContext &Context,
EVT ConditionVT) const override {
return ConditionVT.getSimpleVT();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated change

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is based on another (PR 94959) which hasn't been merged yet. Once the other PR is merged this one will be rebased and this change will disappear. This was done so that our automated testing combines these changes.

@michalpaszkowski
Copy link
Member Author

@efriedma-quic @arsenm I do agree that this should not be needed and ideally the intrinsics should be covered in the standard GlobalISel flow, but we need those expanded prior to SPIRVEmitIntrinsics pass for special handling we have there for aggregates. Unfortunately, the problem is not as simple to solve, this patch simply extends the existing expansion we had for umul for the remaining operations.

@michalpaszkowski
Copy link
Member Author

The major (sole) role of the SPIRVPrepareFunctions pass is to prepare and expose all aggregate related instructions and intrinsics to SPIRVEmitIntrinsics pass where:

// This pass performs the following transformation on LLVM IR level required
// for the following translation to SPIR-V:
// - replaces direct usages of aggregate constants with target-specific
//   intrinsics;
// - replaces aggregates-related instructions (extract/insert, ld/st, etc)
//   with a target-specific intrinsics;

@arsenm
Copy link
Contributor

arsenm commented Jun 11, 2024

for special handling we have there for aggregates.

These shouldn't need treatment as aggregates, they will IRTranslate to a return of tuple?

@michalpaszkowski
Copy link
Member Author

michalpaszkowski commented Jun 11, 2024

for special handling we have there for aggregates.

These shouldn't need treatment as aggregates, they will IRTranslate to a return of tuple?

We generate InsertValue instructions which we directly replace in the SPIRVEmitIntrinsics. We want all of these to "survive" IRTranslation.

@efriedma-quic
Copy link
Collaborator

Unfortunately, the problem is not as simple to solve, this patch simply extends the existing expansion we had for umul for the remaining operations.

I'd prefer to solve this sooner rather than later. I'm not so much worried about the size of this particular patch, but this seems like it's going to be a continual maintenance burden for target-independent changes.

We generate InsertValue instructions which we directly replace in the SPIRVEmitIntrinsics. We want all of these to "survive" IRTranslation.

Probably the SPIRV backend should be doing calling convention lowering inside GlobalISel, instead of trying to do it as a pre-pass. Calling convention lowering is all target-specific anyway, so you can generate whatever you need.

@michalpaszkowski
Copy link
Member Author

Unfortunately, the problem is not as simple to solve, this patch simply extends the existing expansion we had for umul for the remaining operations.

I'd prefer to solve this sooner rather than later. I'm not so much worried about the size of this particular patch, but this seems like it's going to be a continual maintenance burden for target-independent changes.

We generate InsertValue instructions which we directly replace in the SPIRVEmitIntrinsics. We want all of these to "survive" IRTranslation.

Probably the SPIRV backend should be doing calling convention lowering inside GlobalISel, instead of trying to do it as a pre-pass. Calling convention lowering is all target-specific anyway, so you can generate whatever you need.

I do agree on the high level, there might be some additional details which I do not remember right now, but I will make sure to experiment with this a bit more and discuss this in the upcoming SPIR-V backend sync up on Monday. Though, the direction and goal is to address this issue. Also simplification here would only benefit us going forward.

For this particular patch, if possible, let's continue with the expansion in the pre-IRTranslation. I will hold on with merging this patch by the end of the day if there are any additional comments.

Still, the implementation will need to be changed in the next patch to use native SPIR-V OpIAddCarry and OpISubBorrow.

@efriedma-quic
Copy link
Collaborator

If it would be significantly more convenient for you to land this, then look at the underlying issue, I guess I won't block that. But I'd consider fixing it properly a blocker for exiting "experimental" status.

@michalpaszkowski
Copy link
Member Author

michalpaszkowski commented Jun 12, 2024

This particular case (llvm.x.with.overflow intrinsics) probably can be reworked in entirety using native SPIR-V instructions (coming in another patch as this requires further testing with our non-standard compliant SPIR-V consumer). Given that we have 1:1 mapping we won't need to do anything related to aggregates prior to IRTranslation for these builtins.

However, the majority of logic in SPIRVPrepareFunctions and SPIRVEmitIntrinsics related to handling aggregates is not so simple to be reworked -- if this is what you mean -- and I am not even entirely sure it is possible to handle some cases after IRTranslation without too much inefficiency and unnecessary complexity.

@efriedma-quic Could you please elaborate what do you consider exactly to be a blocker for exiting "experimental" status? Also, would you be available to join our SPIR-V backend sync up meeting this upcoming Monday to discuss this? Alternatively we could discuss this over email if this would work better. We would gladly take any tips on how we could rework our pre-IRTranslation passes. SPIR-V is really on similar level as LLVM IR and things that apply to other targets do not fit as easily here.

CC: @VyacheslavLevytskyy @iliya-diyachkov

@efriedma-quic
Copy link
Collaborator

Could you please elaborate what do you consider exactly to be a blocker for exiting "experimental" status?

Specifically, the fact that it's impossible for an intrinsic with multiple results to reach GlobalISel.

The reason why we want SPRIV lowering to go through GlobalISel in the first place is so that we can take advantage of shared code: legalization and lowering optimizations don't need to be specifically coded for SPIRV. If a large fraction of intrinsics can't ever be lowered to GlobalISel operations, we lose that: those intrinsics will all need to be special-cased in SPIRV-specific code.

I expect the simplest way to make that would would be to just reconstruct aggregates inside GlobalISel... but if you can come up with some other approach to exclude intrinsics from aggregate lowering, that's okay, I guess.

@efriedma-quic
Copy link
Collaborator

I'll plan to attend the meeting next Monday to discuss.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants